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ABSTRACT 


The main purpose of the present work is to develop a theory for multiple knowledge systems. A 
knowledge system could be a sensor or an expert system, but it must specialize in one feature. The 
problem is that we have an exhaustive list of possible answers to some query (such as "What object is 
it?"). By collecting different feature values, we should, in principle, be able to give an answer to the 
query, or at least narrow down the list. 

Since a sensor, or for that matter an expert system, do not in most cases yield a precise value for 
the feature, uncertainty must be built into the model. Also, we must have a formal mechanism to be 
able to put the information together. We chose to use the Dempster - Shafer approach to handle the 
problems mentioned above. 

We introduce the concept of a state of recognition and point out that there is a relation between 
receiving updates and defining a set valued Markov Chain. Also, deciding what the value of the next 
set valued variable is can be phrased in terms of classical decision making theory such as minimizing 
the maximum regret. Other related problems are looked at 
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INTRODUCTION 


The purpose of the present work is to show how taking independent and very diverse evidence, 
we can piece things together to arrive at an answer to the question: f What object is it? . We will take 
the Dempster - Shafer approach to put the evidence together. Such an approach has recently been 
taken in expert systems, see [2], [3], and [4]. However, to the best of this writer s knowledge, the 
results shown here are original. We start out with a simple example. 

Consider the following data which assigns masses to subsets of {Bird, Plane, & Superman} 
according to the velocity observed: . 


VELOCITY 

B 

P 

s 

{BP} 

{BS} 

{PS} 

{BPS} 

0- 100 

.5 

.1 

.1 

.2 

.04 

.04 

.02 

101-200 

0 

.4 

.1 

0 

0 

.5 

0 

201 - 500 

0 

.5 

.1 

0 

0 

.4 

0 

> 500 

0 

.1 

.7 

0 

0 

.2 

0 


NOTE: 

• Birds don’t fly with velocity > 100. 

• Superman likes to fly at over 500 but he can fly at any speed he wants to. 


It should be noted that the sum across each row is 1. The interpretation of the results says, for 
example, that at velocities exceeding 500 mph, the expert believes that the object is Superman. That 
expert doesn’t totally rule out the possibility of plane as he assigns a mass of .1 to that event, and also 
that expert is somewhat unsure if the object is Plane or Superman and therefore, he assigns a mass of 
.2 to that aggregate. Note that we do not have Probability of {PS} be the sum of the Probability of P 
and of S. Masses assigned to sets that are not singletons denote the uncertainty of the expert. For 
example, .02 assigned to {BPS} reflects the degree of total ignorance that the expert has regarding 
what the object is when the object travels at less than 100 mph. Such a mass assigned is typical of the 
Dempster - Shafer approach to handle uncertainty in expert systems. See [10], 


We now write down the data relative to observed color: 


COLOR 

B 

P 

s 

{BP} 



{BPS} 

SILVER 

.05 

.6 

.05 

.1 

0 

.15 

.05 

WHITE 

.1 

.1 

.05 

.5 

.05 

.15 

.05 

RED 

.1 

.1 

.1 

.2 

.2 

.2 

.1 

BLUE 

.1 

.1 

.1 

2 

.2 

.2 

.1 

RED-BLUE 

.04 

.04 

.8 

0 

.05 

.05 

.02 

OTHER 

.6 

.1 

0 

.3 

0 

0 

0 


NOTE: 

• Red and blue generate the same (conditional) mass 

• A gray bird may appear silver 

• Superman wears red and blue but from some angles he may appear all red or all blue 

• When flying at certain speeds, Superman may appear as a white or silver streak 

• Color other than red, blue, white, silver rules out Superman 

These two tables sum up the information collected from the experts. What we would like to do, 
of course, is to put these two pieces of evidence together. 
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CONCEPTS AND NOTATIONS 


We now will formally define the concept of a mass function. A mass function is a function from 
subsets of the frame of discernment 0 into [0, 1] satisfying the following conditions: 

(i) m ( 0 ) = 0 


(ii) I m ( A ) = 1 where the sum is over all subsets of 0 
If mi & m2 are two mass functions, we define 

(m 1 ©m 2 )(C)= X m l (A)m 2 {B) 
AAB = C 

Where k is the conflict 

k - ^ m l (A) m 2 {B) 

A AS = 0 


(1 - k) 


The operation defined above defines how to put information together. If two knowledge systems 
generate mi & m2, mi © m 2 is the mass generated by combining the two knowledge systems, see 
[10]. For a very readable interpretation of the combination rule in the setting of databases, see [12]. 


The belief generated by m is defined by 

Bel(A) = ^ m(B) overall sets B such that B C A. Also we define the plausability by 
Pis ( A ) = ^ m(B) over all sets B such that BAA * 0 


Now if the l th sighting takes place at time , set dt [ = t ( - t l{ 

Obviously, dt [ denotes the elapsed time between sightings. Assume that we have a weight 
function A (•) satisfying 

(i)0 5S \ ( dt ,) < 1, dt t > 0 


(ii) X ( dt. ) non decreasing as a function of dt t 
We use weight to adjust masses. There are two ways to adjust 

m (•) = nijO) 

a ) m ( (») = X (dt { )m (•) + (1 - A (dt [ ))m l _ t (*) 

kj m { (•) = X (cftj ) m (•) + ( 1 — X ( dt l )) t ( •) 

If A is high i.e., dt, high, go with the current observation 
If A is low i.e., dt t low, go with the accumulated data. 

Note that the first update is Markov - like as it only uses the mass collected on the previous sighting. 
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For our example, we could define the weight function by: 


f dt / 300 if dt l < 3001 
^ ^ ^ _ [ 1 otherwise j 

That is, after 5 minutes, forget the previous observations and assume that a new object is being 
observed. The rationale for this is that the data has become too old to be reliable. 

Going back to our example of bird, plane, and Superman, assume we have three sightings: 


SIGHTING 

TIME 

dti 

VELOCITY 

COLOR 

1 

1 :00 p.m. 

0 

101-200 

WHITE 

2 

1 :01 p.m. 

60 

201 - 500 

WHITE 

3 

1:29 p.m. 

1740 

0-100 

OTHER 


The combined masses, not time adjusted are given below: 


SIGHTING 

B 

P 

s 

{BP} 

{BS} 

{PS} 

{BPS} 

1 

m l (•) 

0 

.7750 

.1 

0 

0 

.125 

0 

2 

m 2 (*) 

0 

.8101 

.0886 

0 

0 

.1013 

0 

3 

m 3 (-) 

.811 

.1024 

0 

.0866 

0 

0 

0 


The combined masses, time adjusted are given below: 


SIGHTING 

B 

P 

s 

{BP} 

{BS} 

{PS} 

{BPS} 

1 


0 

.775 

.1 

0 

0 

.125 

0 

2 

m 2 (-) 

0 

.78202 

.09772 

0 

0 

.12026 

0 

3 

m 3 ( • ) 

.811 

.1024 

0 

0866 

0 

0 

0 


NOTE: m 3 is not really time adjusted as sightings are more than 5 minutes apart. 
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Computing the belief, at each sighting, with respect to the time adjusted mass we have: 


SIGHTING 

OBJECT 

Bel (•) 

Bel (“> *) 

Bel(*) - Bel( - ' •) 

CLASSIFICATION 


B 

0 

i 

-1 


1 

P 

.775 

.1 

.675 

P 


S 

.1 

.775 

-.675 



B 

0 

1 

-1 


2 

P 

.78202 

.09772 

.6843 

P 


S 

.09772 

.78202 

-.6843 



B 

.811 

.1024 

.7086 


3 

P 

.1024 

.811 

-.7086 

B 


S 

0 

1 

-1 



The rationale for the table above is that Bel (•) - Bel (“»•) measures how much a specific object 
exceeds, belief* wise, its competition. This criterion was already used in [8]. Thus the conclusion is 
that a plane was observed on the first and second sighting and a bird was observed on the last sighting. 
This example shows that there will be a payoff in studying a multi-knowledge systems setting. We 
also remark that a similar, but somewhat more complex approach could be used to obtain classification 
sequences . 

MESHING THE INFORMATION COLLECTED FROM MULTIPLE KNOWLEDGE SYSTEMS 


In this section, we consider the composition rule to be defined by the numerator only of 
{rn 1 0 m 2 ) (C). (Thus the empty set may pick up mass). 

We now shift somewhat our perspective. Consider Knowledge Systems KS lt KS 2 , ...KS n 

KSj reads the J th feature and interprets its value to be f l with a probability cr ^ 

It is important to keep in mind that in this setting, each knowledge system specializes in recognizing a 
specific feature. 

Thus, KS t defines the mass on 0 by 

m . (A . . = a . . where 
j ji ji 


A . . denotes all objects of © whose feature has t/alue f l 
j 1 J 

After we have interrogated KS lf KS 2 , ... f KS q possible answers are in sets 

A}t t A A2t 2 A ...A Aqt . Let X q be set- valued variables whose values are A it f A ...A A q t q 

(t lt t range over all possible values of the corresponding features). X q indicates the current state 

of recognition. X q may be viewed as a random set [9], 

We have shown that X q forms a (non-stationary) Markov chain. In fact, the transition 

probabilities are given by 


Pr 


X ,+ \ = A u 


1 


A... A A , A A 


q+lt 


g+1 


X = A. . A... A A 


1 1 


q t 




m 


, . B f . 

q+ 1 \ q+ 1, i 
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v 1 ( A ) = I Pr ( V 1 = 4 1 x , = B ) ^,( B ) where 

B 

A is of the form A;^ A ... A Aqt ; A Aq + it q +} and B is of the form Ai Ul A ... A Aq U(? 
with B A = A 

Since Xg forms a Markov Chain, a study of absorbing sets as well as entry and exit times could 
be made. We choose not to deal with these rather general questions but rather to pause some specific 
problems such as: what is the probability of realizing for the first time, as we interrogate KSq + 1 , 
that the answer is not in the frame of discernment. What is the probability of getting no information 
from KSq + il (Of course we assume that KS KS 2 t KS q were already interrogated). The 
answer to such questions has been derived and is given below. 

Pr (Realizing for the 1 st time at time q + 1 that answer not in frame of discernment) 

= y m AB , . Jp ( A w A... A A , ) 

<i+i \ i^ < i i q ) 

where the sum is over all local elements Bq + [ of rriq + i such that 

A A...AA B =0 yet A A...AA . * 0 

u i qt q u i qt g 

That is, the averaged Belq + 1 of being outside the range of Xq when KSq + i is interrogated. 

Pr (No Info, from KS q + i) = X m q+ i( B q+ i . t )^( A ) 

where the sum is over B q + ii supersetof A, and A isoftheform A^ A... A A q t q 
Using the transition probabilities we have 

Pr(X ,=A_, A... A A. ...X = A ) 

V <?+! u i 1 u i / 

where the first sum is over all BAAn l = A2t 2 AAit } , and the last sum is over all B such that 

BAA qtq A...AAu j =A q + i tq+1 A...AAit t 

All of this points out that it is very important to carefully evaluate X q 
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TRUNCATING THE INFORMATION 

Let { Ai , A 2 , } be distinct focal elements of mi, m2, ... We can view KSt as confirming 

Ai to the degree Clfi Let CLti* be the largest CLti for t fixed ( I * depends on t). 

Now view KSt as confirming Ai* to the degree a ti* and ignore the rest of the information 
yielded by KSt (ie., take only the highest confirmation of KSt)- Ifs;, S2, ... Sfe supported Ai*, 

Ai* is supported to the degree 2- ( 1-si ) (I-S 2 ) (1-Sfc) 

If the resulting mass on A i is 

Evi. (A. ) = p. , weset Evi. {A. , A } = r. , where p. + r. = 1 

i i *1 9 i 1 n 1 r i 1 

The rationale for doing this is to trust our estimate of the mass on each A i, which came from the 
highest degrees of confirmation, and to ignore the rest, i.e., spread the rest of the mass on all the 
possibilities. 

It can be shown, see [1] that 

Bel(A.) = Kp FI r „ 7 , 

J J 1 1 1 Where 

*-*| [ru,( i+ ^>.h) 

j i 

At stage q + 1, we then pick Aj maximizing Bel coming from mi, m2, ••• mq -f l ■ In this way, 
prior information given by mi, ... mq, as well as the current information yielded by mq + i, is taken 
into account. 

We can extend this to keeping the two highest confirmations KSt, as mentioned ealier, assigns 
a*;* to Ai* and if the second highest CLti t call it $tj*t is assigned to 

{Ai I i*y (spread around the 2nd highest) 

The rest 1 - dti* - $tj* is assigned to {A i, . . . A 

It can be shown , see [1], that 

BdtAp^K L n ‘ t ,+ r j n «, 

j*i 

k = n d j i+ x p t r t - n °j 

j i J 

Where pi + C[ + di = 1. 
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APPLYING CLASSICAL DECISION THEORY TO SELECT VALUES FOR ,Xq + 1 


If KSi, ... KSq yield enough information so that 


Pr\X = A, , A... A A , 

<7 1; 1 


= 1L 


A, A... A A , 
lf l qt c 


1 ' Q 

can be trusted then, maximize over Bq+i tl the following expression: 




Belief q-f- 1 is generated by m \ ® ® m q+i 

We view this as making a decision in the environment A i tj A ... A Aqt q assuming the 
probabilities are known. Picking the alternative Bq+i t i yields a payoff of 
Belief q + 1 (Ait I A... A Aqt q /\Bq + i > i ) and we maximize the averaged payoff. Whatif 
KSi,. ..KSq are not too reliable? We know the patterns but we are not sure about their probabilities. 
Pick the alternative Bq + l t i so as to maximize the minimum of 


Bel 


q+ 1 


A, A... A A , 
U l qt q 


A B 


q+l, 


-Bel 


q+l 


A „ A - 


AA , A 


j * 1 


B 


<j+ 1, i 


Here the minimum is taken over all environments Ait 1 A ... A Aqt q . Belq + i is generated 
by rriq 1 . The motivation is that picking the minimum represents the worst environment for payoff 
of alternative B q + i f i over competing alternatives. Picking the maximum represents then the 
maximum gain. This approach is pessimistic in nature (going to the worst environment and then 
making the best of a bad situation). At the other end of the spectrum, the maximum of the maximum 
represents the optimistic attitude. 

Picking a convex combination of the two represents a compromise. 

Another approach yet is to minimize the largest regret. Let 


A A ... A A , 

U 1 qt q 


= D 


A, , A... A A 

1< 1 % 




A. A... A A 
u l qt q 


Where Di denotes the above difference of beliefs. T measures the regret of picking the 
alternative ; over the best alternative ( T < 0) Picking the minimum over the environment, 

produces the largest regret. Picking then the maximum, minimizes that largest regret. 
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DECISION MAKING AND TRUNCATING THE INFORMATION 


Here, we believe that some patterns are a definite possibility. We also want to ignore the rest of 
the patterns, also we do not trust any probabilities functions associated with KS } , ... KSq. Going 
back to the previous section, we see that the four previous algorithms are well defined if we restrict the 
environments A / ^ A ... A A q t ■ to a fixed set P. We now refine this by allowing P to be a fuzzy set. 
Thus 


P = y a 

qli t v ~t 

Here a q(t - t ) denotes the degree of membership of 
l k 


A A ... A A 

U l qt Q 


A, A...AA , in P. 
U 1 qt q 


We now must define an appropriate fuzzy set of payoffs. In the case of an optimistic or 
pesimistic or a somewhere in between attitude, we consider 


c .( p )= i 


Bel 


<7+l\ If 


A. A...AA AS , , . )-Bel 


q t q + 1, i 


<7 + 


i ( A i*i 


A...A A A * B . 

qt q 1 1 +l 'J 


We need to be able to take the minimum or maximum element of a fuzzy set. 


If 


A = 5N 


Set 


(X) = Min a . 



a. 

1 1 

i 


The minimum of A is defined as 


ly (X) d X 


This coincides with minimum in the crisp case and is defined in [1 1] Again the 4 algorithms 
defined earlier now make sense. 

THE GENERAL CASE 

We interrogate KSi, ... KSq and split the corresponding patterns into disjoint blocks Pfe. 
The blocks could correspond to classifications such as highly likely, likely, somewhat likely patterns, 
etc.... We also assume P& are fuzzy sets (is A... A Aqt q highly likely?). We set 


j 2 a q<t v .. t k )J 


A A... A A , 

u i qt q 
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Also we assign masses to each block, reflecting the weight put on the blocks (this reflects the 
trust put on the corresponding patterns in the class). Let 


m P = a 


The first criteria, for example, would maximize over the alternatives B q + ll 
^ /. m i^k) w ^ ere h\k) is the minimum of the fuzzy set 


V 


qU v t k ) ' k 


Bel 


< 7 + 


.( A n, A - - A \« A Vu)- Be U.( A u A ■ A A ,., A . Vu 


~q I ' q 

It is clear that the other three algorithms generalize to this situation. The sets are replaced by 
"averages” and the minimum and maximum need to be taken over fuzzy sets, as explained earlier. For 
other methods available in the setting of decision making, the reader is referred to [51, [6], and [7], 

TOWARD A GENERAL THEORY OF MULTIPLE KNOWLEDGE SYSTEMS 


The previous discussion points out the importance of building a formal theory for the multiple 
knowledge systems setting. Our present work generalizes the situation described in [8] and 
constitutes the first steps toward such a theory. Our basic assumptions are: 

(i) Our knowledge systems are independent 

(ii) Each knowledge system specializes in one feature 

(iii) Each feature may have several knowledge systems assigned to it. 

We may not want to access all KS’s and therefore, we have to solve the following problem: 

The access problem: Which sets of KS* S does one access? (some KS may run in parallel) 

In what order does one access these sets? 

We have performance parameters such as reliability, expense, response time, etc... Information 
regarding these parameters are contained in special iCS's called CKS ' $ (Control Knowledge Systems). 

Each CKS specializes in one performance parameter 

One performance parameter may correspond to several CKS's. 

Each KS has two components: 

a) The observational component which reports on the value of a specific feature. It may 
return a value or a probability distribution over the set of possible feature values (e.g., 
red or .8/red 4* .2/blue). 

b) The judgemental component which reports on how likely it is that the true answer lies in 
some set of possible answers given that a specific value of a feature has been observed. 
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We define a control strategy to be a sequence of performance parameters specifications. This 
generates an access to a set of KS's. After these KS's have been used, the belief structure of the 
frame of discernment is updated. Then stopping rules are looked at. If stopping criterias are not met, 
we go to the next control strategy. 

If all control strategies have been exhausted, a decision is made as to what the probable answer is. 
Access Policy 

Each control strategy is a list of performance objectives. On the I th control strategy, let 0/ 
denote all available KS's 0; D ©2 D ... D 0/ ... as we don't want to reuse the same KS's (we 
want to have independent sources of information). The decision as to what KS's to use is made on 
information contained in the CKS's. 

Each CKS has two components: 

a) Component - A which decides on what are the best subsets of 0; to consider when the 
value of performance objective Pj p j 

i.e., If 

Component A computes all 

0, F 1 

0,1 [ which 


, ... CKS specialize on performance j 


represent 


Jt 


u i • 
jt 


1 < t< r. 

J 


We define u ^> j g J _ q if B contains any pair of KS's which can’t run in parallel 


Thus 


jt 


u V is non-zero only on sets of KS's that run in parallel 


b) Component - B which makes a probabilistic judgement of what is the best value of 
performance Pj. i.e., for CKSt, 


P ^ = Pr 

^jk 


P= P 

J j 


CKS 1 

J 


Let B C 0/ 


(set of available KS's) Define 

cks;)= 


n . ( B 

j 


Here is given by component A of CKS*. for each value p^andpj^ is given by component — B 


of CKS j for each value p*._ In other words, the above expression represents how good, on the average, 


the set B is as determined by CKS j 
We now want to mesh all the CKS for a fixed performance. 


(b 

CKS 1 , 

.. CKS rj ) = ® J n (b 

\ 

j 

J J A 


CKS 1 
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Now to mesh all the performance parameters 


n B 


all the CKS involved') = ® n.\B CKS l .,...CKS j 
/ JfX J t J t ' 


Now look at all BC0( made up of KS'$ that can be accessed in parallel For such B's maximize 

Bel{B) - Bel(^B) 

At this point, we have picked a set of KS' S to run in parallel. 

We now have to interrogate the KS S that are in B and update our belief structure on the frame of 
discernment (and go from 0; to 0/-1 ) 

Now we have KS \ Recall it has an observational and a judgemental component 
The judgemental component is represented by 

e ^ f 

0,1 


v ( »: 2 9q - 


This represents 


v*l A 

it 


f 


the degree of belief that A is the smallest set containing the right answer, given feature value f* 

The observational component returns either a single value but in more complex case, a probability 
distribution. The notation: 


KS,-!., 


®V\‘, 


means that KS's on the I th strategy that are in the selected set B report on features il, i2» • * » 
The observational components report 




f 


KS 1 




(the l index refers to the l control strategy) 
Thus, we define masses (over fixed features) 


m h \A 


n;(‘, ))-?■*(', .‘HM 


(the averaged mass assigned by the judgemental component) 

We now mesh over all features (determined by 5, the selected set) 


m l I A 


KSj,...Ksfj = t ®'. ! m u (a | ITS j(i ( )) 


At the end of the L control strategy, our total information is summed by 

mjy* all KS’s involved j = ( ® t KS j , KS s ( l J 

We now must deal with the decision rule of what object must be selected as a plausible answer. 
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Step 1: 

Let a in 0Q be the element maximizing 

Bd L {a) - Bel L (-*a) 

If L denotes last control strategy, pick ’a’ 
else go to Step 2 
Step 2: 

a) If 5^0 denotes some fixed threshold 
Pick V if Bel (a) > 5 x 

b) If 5 2 >0 denotes some fixed number 
Pick ’a’ if Pls L (a) — Bel (a) < 5 2 

c) Combined 'a' and V 

If a doesn't satisfy the criterion, go to the next control strategy. The rationale for the stopping rule is 
V is that we would like the belief in 'a’ to exceed some threshold and have uncertainty relative to a 
drop below some predefined level. 

It is clear that much research remains to be done. For example, degradation of the information 
contained in the KS'$ has not been considered in the last part of this report. This and additional 
problems will be addressed in future work. Finally, for applications of the Dempster-Shafer approach 
to artificial intelligence, the reader is referred to [3] and [4]. 
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